Goto

Collaborating Authors

 Msida


The Procedural Content Generation Benchmark: An Open-source Testbed for Generative Challenges in Games

arXiv.org Artificial Intelligence

This paper introduces the Procedural Content Generation Benchmark for evaluating generative algorithms on different game content creation tasks. The benchmark comes with 12 game-related problems with multiple variants on each problem. Problems vary from creating levels of different kinds to creating rule sets for simple arcade games. Each problem has its own content representation, control parameters, and evaluation metrics for quality, diversity, and controllability. This benchmark is intended as a first step towards a standardized way of comparing generative algorithms. We use the benchmark to score three baseline algorithms: a random generator, an evolution strategy, and a genetic algorithm. Results show that some problems are easier to solve than others, as well as the impact the chosen objective has on quality, diversity, and controllability of the generated artifacts.


Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation

arXiv.org Artificial Intelligence

News videos require efficient content organisation and retrieval systems, but their unstructured nature poses significant challenges for automated processing. This paper presents a comprehensive comparative analysis of image, video, and audio classifiers for automated news video segmentation. This work presents the development and evaluation of multiple deep learning approaches, including ResNet, ViViT, AST, and multimodal architectures, to classify five distinct segment types: advertisements, stories, studio scenes, transitions, and visualisations. Using a custom-annotated dataset of 41 news videos comprising 1,832 scene clips, our experiments demonstrate that image-based classifiers achieve superior performance (84.34\% accuracy) compared to more complex temporal models. Notably, the ResNet architecture outperformed state-of-the-art video classifiers while requiring significantly fewer computational resources. Binary classification models achieved high accuracy for transitions (94.23\%) and advertisements (92.74\%). These findings advance the understanding of effective architectures for news video segmentation and provide practical insights for implementing automated content organisation systems in media applications. These include media archiving, personalised content delivery, and intelligent video search.


Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum

arXiv.org Artificial Intelligence

The evaluation of smart contract reputability is essential to foster trust in decentralized ecosystems. However, existing methods that rely solely on static code analysis or transactional data, offer limited insight into evolving trustworthiness. We propose a multimodal data fusion framework that integrates static code features with transactional data to enhance reputability prediction. Our framework initially focuses on static code analysis, utilizing GAN-augmented opcode embeddings to address class imbalance, achieving 97.67% accuracy and a recall of 0.942 in detecting illicit contracts, surpassing traditional oversampling methods. This forms the crux of a reputability-centric fusion strategy, where combining static and transactional data improves recall by 7.25% over single-source models, demonstrating robust performance across validation sets. By providing a holistic view of smart contract behaviour, our approach enhances the model's ability to assess reputability, identify fraudulent activities, and predict anomalous patterns. These capabilities contribute to more accurate reputability assessments, proactive risk mitigation, and enhanced blockchain security.


Can Large Language Models Capture Video Game Engagement?

arXiv.org Artificial Intelligence

Can out-of-the-box pretrained Large Language Models (LLMs) detect human affect successfully when observing a video? To address this question, for the first time, we evaluate comprehensively the capacity of popular LLMs to annotate and successfully predict continuous affect annotations of videos when prompted by a sequence of text and video frames in a multimodal fashion. Particularly in this paper, we test LLMs' ability to correctly label changes of in-game engagement in 80 minutes of annotated videogame footage from 20 first-person shooter games of the GameVibe corpus. We run over 2,400 experiments to investigate the impact of LLM architecture, model size, input modality, prompting strategy, and ground truth processing method on engagement prediction. Our findings suggest that while LLMs rightfully claim human-like performance across multiple domains, they generally fall behind capturing continuous experience annotations provided by humans. We examine some of the underlying causes for the relatively poor overall performance, highlight the cases where LLMs exceed expectations, and draw a roadmap for the further exploration of automated emotion labelling via LLMs.


Graph Based Traffic Analysis and Delay Prediction

arXiv.org Artificial Intelligence

This research is focused on traffic congestion in the small island of Malta which is the most densely populated country in the EU with about 1,672 inhabitants per square kilometre (4,331 inhabitants/sq mi). Furthermore, Malta has a rapid vehicle growth. Based on our research, the number of vehicles increased by around 11,000 in a little more than 6 months, which shows how important it is to have an accurate and comprehensive means of collecting data to tackle the issue of fluctuating traffic in Malta. In this paper, we first present the newly built comprehensive traffic dataset, called MalTra. This dataset includes realistic trips made by members of the public across the island over a period of 200 days. We then describe the methodology we adopted to generate syntactic data to complete our data set as much as possible. In our research, we consider both MalTra and the Q-Traffic dataset, which has been used in several other research studies. The statistical ARIMA model and two graph neural networks, the spatial temporal graph convolutional network (STGCN) and the diffusion convolutional recurrent network (DCRNN) were used to analyse and compare the results with existing research. From the evaluation, we found that the DCRNN model outperforms the STGCN with the former resulting in MAE of 3.98 (6.65 in the case of the latter) and a RMSE of 7.78 (against 12.73 of the latter).


Affectively Framework: Towards Human-like Affect-Based Agents

arXiv.org Artificial Intelligence

--Game environments offer a unique opportunity for training virtual agents due to their interactive nature, which provides diverse play traces and affect labels. Despite their potential, no reinforcement learning framework incorporates human affect models as part of their observation space or reward mechanism. T o address this, we present the Affectively Framework, a set of Open-AI Gym environments that integrate affect as part of the observation space. This paper introduces the framework and its three game environments and provides baseline experiments to validate its effectiveness and potential. Video games are ideal stimuli for research in Affective Computing [1] for several reasons. Firstly, the user is free to play in many different ways, leading to diversity in their play traces and emotional experiences [2].


Closing the Affective Loop via Experience-Driven Reinforcement Learning Designers

arXiv.org Artificial Intelligence

Abstract--Autonomously tailoring content to a set of predetermined affective patterns has long been considered the holy grail of affect-aware human-computer interaction at large. In this paper, we propose a novel reinforcement learning (RL) framework for generating affecttailored content, and we test it in the domain of racing games. Specifically, the experience-driven RL (EDRL) framework is given a target arousal trace, and it then generates a racetrack that elicits the desired affective responses for a particular type of player. EDRL leverages a reward function that assesses the affective pattern of any generated racetrack from a corpus of arousal traces. Our findings suggest that EDRL can accurately generate affect-driven racing game levels according to a designer's style and outperforms search-based methods for personalised content generation. The method is not only directly applicable to game content generation tasks but also employable broadly to any domain that uses content for affective adaptation. Two examples of maximally and minimally arousing tracks generated by EDRL for the Solid Rally racing game.


AI as a Tool for Fair Journalism: Case Studies from Malta

arXiv.org Artificial Intelligence

--In today's media landscape, the role of Artificial Intelligence (AI) in shaping societal perspectives and journalistic integrity is becoming increasingly apparent. This paper presents two case studies centred on Malta's media market featuring technical novelty. Despite its relatively small scale, Malta offers invaluable insights applicable to both similar and broader media contexts. These two projects focus on media monitoring and present tools designed to analyse potential biases in news articles and television news segments. The first project uses Computer Vision and Natural Language Processing techniques to analyse the coherence between images in news articles and their corresponding captions, headlines, and article bodies. The second project employs computer vision techniques to track individuals' on-screen time or visual exposure in news videos, providing queryable data. These initiatives aim to contribute to society by providing both journalists and the public with the means to identify biases. Furthermore, we make these tools accessible to journalists to improve the trustworthiness of media outlets by offering robust tools for detecting and reducing bias.


ColorFoil: Investigating Color Blindness in Large Vision and Language Models

arXiv.org Artificial Intelligence

In this benchmark, With the utilization of Transformer architecture, large foils are generated from the existing V&L datasets for each Vision and Language (V&L) models have shown promising of the tasks. A foil is referred to as a distractor or slightly performance in even zero-shot settings. Several studies, incorrect example that is passed along with the correct example however, indicate a lack of robustness of the models when to the V&L model to assess the model's ability to dealing with complex linguistics and visual attributes. In correctly distinguish them [17, 22]. Although the existing this work, we introduce a novel V&L benchmark - Color-V&L benchmarks like VALSE help the community to test Foil, by creating color-related foils to assess the models' the capabilities of V&L models, there is still much work to perception ability to detect colors like red, white, green, etc. be done to evaluate the robustness and generalizability of We evaluate seven state-of-the-art V&L models including the models on numerous other tasks. It remains unknown CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot how well the large V&L models can perceive colors from setting and present intriguing findings from the V&L models.


Dynamic Quality-Diversity Search

arXiv.org Artificial Intelligence

Evolutionary search via the quality-diversity (QD) paradigm can discover highly performing solutions in different behavioural niches, showing considerable potential in complex real-world scenarios such as evolutionary robotics. Yet most QD methods only tackle static tasks that are fixed over time, which is rarely the case in the real world. Unlike noisy environments, where the fitness of an individual changes slightly at every evaluation, dynamic environments simulate tasks where external factors at unknown and irregular intervals alter the performance of the individual with a severity that is unknown a priori. Literature on optimisation in dynamic environments is extensive, yet such environments have not been explored in the context of QD search. This paper introduces a novel and generalisable Dynamic QD methodology that aims to keep the archive of past solutions updated in the case of environment changes. Secondly, we present a novel characterisation of dynamic environments that can be easily applied to well-known benchmarks, with minor interventions to move them from a static task to a dynamic one. Our Dynamic QD intervention is applied on MAP-Elites and CMA-ME, two powerful QD algorithms, and we test the dynamic variants on different dynamic tasks.